E n = n + 1. λ n k (x k ˆm) 2. n 1. (x k m) 332:525 Homework Set 1

Size: px
Start display at page:

Download "E n = n + 1. λ n k (x k ˆm) 2. n 1. (x k m) 332:525 Homework Set 1"

Transcription

1 332:525 Homework Set Estimation Problems. Recursive Least-Squares (RLS) Estimators: Consider a sequence of iid random variables x n, n = 0,,..., and form the running average of the first numbers considered as an estimate of the mean m = E[x n ]: ˆm n = x 0 + x + +x n (a) Show that ˆm n is the optimum solution that minimizes the sum of squares: E n = (x k ˆm) 2 What is the minimized value of E n? (b) Moreover show that ˆm n can be re-expressed in the following timerecursive forms, with the second being a Kalman-type predictor/corrector form: ( ) ( ) n ˆm n = ˆm n + x n ( ) ˆm n = ˆm n + (x n ˆm n ) Note that these recursions connect the optimum least-squares solutions of two different performance indices. Indeed, ˆm n minimizes the performance index E n, whereas ˆm n minimizes E n which runs up to time k = n. (c) Show that ˆm n is an unbiased estimator of the mean. Determine the variance of ˆm n, that is, the quantity, var( ˆm n )= E[( ˆm n m) 2 ] and show that ˆm n is a consistent estimator of the mean. Hint: Show first that ˆm n m = (x k m) and use the assumption that the x n are iid, which implies the decorrelation condition: E[(x i m)(x j m)]= σ 2 x δ ij. 2. RLS Estimators with Forgetting Factor: The RLS estimator ˆm n of the previous problem is appropriate for stationary sequences, that is, whose statistical characteristics don t change over time. Indeed, the performance index E n treats all time samples from the earliest to the latest on a equal footing. Initially, the estimator ˆm n converges very fast to the optimum value m and then gets stuck at that optimum value because the Kalman-type gain factor that appears in the time-update becomes extremely small with increasing n. If there is a non-stationary change in the statistics and the mean m changes to a new value, the estimator ˆm n will have a very hard time tracking this change. A more appropriate estimator for tracking non-stationary changes in the statistics would be one that places more emphasis on the more recent data and less on the older data. For example, the following weighted version of E n emphasizes the current samples more and forgets the older ones exponentially fast: E n = λ n k (x k ˆm) 2 where the forgetting factor λ must be 0 <λ. Note that λ = recovers the above stationary case. (a) Determine the optimum ˆm that minimizes E n and cast it in a timerecursive form such as: ˆm n = ˆm n + b n (x n ˆm n ) How does b n behave in the limit λ? Show that ˆm n is an asymptotically unbiased estimator of m. (b) Show that for fairly large values of n and for λ, the estimator satisfies the first-order difference equation (otherwise known as a first-order smoother): ˆm n = λ ˆm n + ( λ)x n () 3. RLS Estimators with Forgetting Factor: The first-order smoother estimator of Eq. () was obtained for fairly large values of n. However, it can be thought of as a third-type of estimator in its own right. Assume, therefore, that Eq. () defines ˆm n for all n 0. Show that it is asymptotically unbiased but not consistent. Indeed, show that in the limit of large n, the variance of ˆm n tends to the finite value: 2

2 var( ˆm n )= E [( ˆm n E[ ˆm n ] ) 2 ] λ + λ σ2 x However, by choosing λ it can be made as small as desired, thus providing a good estimator. The trade-off is that the closer λ is to, the more sluggish the estimator becomes in tracking non-stationarities. 4. Least-Mean-Square (LMS) Estimators: Consider the theoretical performance index E( ˆm)= E[(x ˆm) 2 ] (2) (a) Differentiating it with respect to ˆm, show that E is minimized for the optimum value of the parameter ˆm = m = E[x]. (b) The LMS algorithm is based on the idea of steepest descent in which ˆm is changed iteratively so that at each iteration the performance index E is decreased and eventually it reaches its minimum value. The key condition is to demand that going from one value of ˆm to the next, say, ˆm + Δ ˆm, will result in a smaller performance index, that is, E( ˆm + Δ ˆm) E( ˆm). This can be guaranteed by choosing the change Δ ˆm to be proportional to the negative gradient of E, that is, (with μ>0) Δ ˆm = μ E (LMS update) ˆm Replace the theoretical gradient by the instantaneous one: E ˆm E ˆm = 2(x n ˆm n ) Apply the LMS update to the instantaneous gradient, that is, ˆm n+ = ˆm n + Δ ˆm n = ˆm n μ E ˆm n And, show that it can be written in a similar form as the RLS estimator of Eq. (): ˆm n+ = λ ˆm n + ( λ)x n where λ = 2μ. Thus, the LMS and RLS algorithms for the recursive estimation of the mean are essentially equivalent. Note, however, that in adapting more than just one parameter, the LMS and RLS algorithms are no longer equivalent the latter having a much faster learning speed at the expense of higher computational cost. 5. Do problems.9 and.0. For Problem.0, suppose the mixing parameter ɛ is known in advance. Instead of sending x and y into a correlation canceler, you carry out a preprocessing operation, replacing {x, y} by the signals {x,y }, where x = x and y = y ɛx, and then send those into a correlation canceler. Determine the optimum canceler weight H. Show that now the noise component of x can be canceled completely. Draw a block diagram of all the processing operations. Note: The circumstances of this problem arise in adaptive antenna sidelobe canceling systems that use linearly polarized antennas. Polarization is used as a useful discriminant between signal and interference. In this application, the parameter ɛ is related to the known polarization angles of the desired signal. The interference signal is also polarized but with unknown polarization angles with respect to the antennas, but that does not matter because the subsequent adaptive canceler determines them adaptively and cancels the interference completely. 6. (a) Let ˆx be the optimum linear estimate of a scalar x based on the random vector y. Show that ˆx remains invariant under a linear invertible transformation of the observation vector, y = Bz (b) Show that E[eˆx]= 0 and E[e 2 ]= E[ex], where e = x ˆx. (c) If x is uncorrelated with y, show that ˆx = Let x be a random variable with mean E[x]= m. We wish to estimate x in terms of a zero-mean vector of observations y. Because the mean of x is not zero, we seek an estimate of the form ˆx = h T y + b The b-term is called a bias term. Assume the correlations R = E[yy T ] and r = E[xy] are known. Show that the optimum choices for h and b that minimize the mean square estimation error E = E[e 2 ], where e = x ˆx, are h = R r and b = m Note: It is straightforward to reformulate such biased estimates adaptively. They are very common especially in neural network applications. 8. (a) Show that the optimum estimate of y based on itself is itself, that is, ŷ = y. (b) Let z = Qy, where Q does not have to be invertible or square. Show that the optimum estimate of z based on y is given by ẑ = Qy, that is, ẑ = z. 3 4

3 (c) Suppose [ ] y is divided into two subvectors y and, that is, y = y. Using the results of the previous part or working directly, show that the optimum estimate of y based on y, that is, ŷ = E[y y T ]E[yy T ] y, is given simply by ŷ = y. 9. (a) A random variable x is related to the random vectors y and by [ ] x = c T y + c T 2 + v = [c T, c T y 2 ] + v = c T y + v where v is uncorrelated with y and. Show that the [ best] estimate y of x based on the combined observation vector y = is given by ˆx = c T y. Therefore, the y-dependent part of x is completely canceled from the error output e = x ˆx, that is, e = v, in this case. (Hint: Show that the solution of the normal equations is h = c.) (b) Determine the optimum estimate ˆx = h T y of x based only on the first observation vector y and show that in this case the y -dependent part of x is still canceled completely from the error output e = x ˆx, whereas the -dependent part is canceled as much as possible, in the sense that e is given by where e = v + c T 2 ( ŷ 2/ ) ŷ 2/ = E[ y T ]E[y y T ] y = R 2 R y is the best estimate of based on y.(hint: Express h in terms of c, c 2, R, R 2.) (c) Show that the minimized mean square error of the above case is given by: E=E[e 2 ]= σv 2 + c T ( 2 R22 R 2 R R T 2) c2 where R 22 = E[ y T 2 ]. Why is the second term in E non-negative? Note: The results of this problem will be used later to develop guidelines for picking the filter order in adaptive filtering applications y 0. Let R = be the covariance matrix of y =, assumed to have zero mean. Determine the innovations representation y 3 y = Bɛ by carrying out the Gram-Schmidt orthogonalization of the components of y. Then, verify the factorization R yy = BR ɛɛ B T by explicit matrix multiplication. Next consider the estimation of a random variable x in terms of y. The 4 cross correlation between x and y is known to be r = E[xy]= 4. 2 Determine the optimum estimation weights h and g with respect to the correlated basis y and the innovations basis ɛ, that is, ˆx = h T y = g T ɛ Hint: Use g = D Lr and h = L T g, where D = R ɛɛ and L = B.. For the previous problem, compute the optimum estimates of x based on the three successively bigger subspaces Y = {y }, Y 2 = {y, }, Y 3 ={y,,y 3 }, in the forms ˆx = h y = g ɛ ˆx 2 = h 2 y + h 2 = g 2 ɛ + g 22 ɛ 2 ˆx 3 = h 3 y + h 3 + h 33 y 3 = g 3 ɛ + g 32 ɛ 2 + g 33 ɛ 3 Show that the g-weights are independent of the order, that is g pi = g i, where g i was found in the previous problem. Show that the above estimates can be recursively constructed by ˆx = g ɛ ˆx 2 = ˆx + g 2 ɛ 2 ˆx 3 = ˆx 2 + g 3 ɛ 3 Assuming σx 2 = 30, use the recursions E i = E i g 2 i E[ɛ2 i ], where E i = E[e 2 i ]= E[ (x ˆx i ) 2], to determine the successive estimation errors E, E 2, E 3. Note the gradual improvement of the estimate as the number of observations is increased. Finally, determine the predictions ŷ 2/ and ŷ 3/2 of and y 3 based on the past subspaces Y and Y 2, respectively, write them in the forms, ŷ 2/ = a 2 y = b 2 ɛ ŷ 3/ = a 3 y a 32 = b 3 ɛ + b 32 ɛ 2 and show that the inverse innovations matrix L = B can be expressed as: L = 0 0 b 2 0 b 3 b 32 = 0 0 a 2 0 a 3 a

4 2. Consider the deterministic random signal y n = 2 cos(ω n + φ), where ω = π/3 and φ is a random phase distributed uniformly over the interval [0, 2π]. (a) Show that y n satisfies an ordinarnd order homogeneous difference equation. (b) Using the definition R(k)= E[y n+k y n ], show that R(k)= 2 cos(ω k). (c) Let y = [y 0,y, ] T be three consecutive samples. Using the results in (b), determine the 3 3 autocorrelation matrix R = E[yy T ] and show that it has zero determinant. (d) Because of the singularity of R, we expect the Cholesky factorization to break down at dimension 3. To see this, carry out the Gram- Schmidt orthogonalization of y starting with y 0 and ending with, and thereby determine the factorization R = BR ɛɛ B T. Is the result consistent with part (a)? 3. (a) Let R(k) be the autocorrelation function of a stationary random signal y[ n. Express ] the autocorrelation matrix of the random vector yn y = in terms of R(k). Then, show the general inequality y n+k R(k) R(0), for all k (b) Let u, v be two random variables. Show the Schwarz inequality: Hint: y = [u, v] T. E[uv] 2 E[u 2 ]E[v 2 ] Supplement Probability and Statistics Problems. (a) Let x be a zero-mean gaussian random variable with variance σ 2 x. Show E[x 4 ]= 3σ 4 x (b) Let x = [x,x 2,...,x N ] T be a block of mutually uncorrelated zeromean gaussian random variables each with variance σx 2. Using the above result, show ( ) E[x i x j x k x l ]= σx 4 δij δ kl + δ ik δ jl + δ il δ jk Show also that their covariance matrix is R xx = E[xx T ]= σ 2 xi where I is the N N identity matrix. (c) Suppose the above N random variables x are mixed up by an arbitrary invertible linear transformation y = Bx resulting into the new set of gaussian random variables y = [y,,...,y N ] T. Let R = E[yy T ] be their covariance matrix. Show that R = σ 2 BB T (d) Show the analogous result of part (b): E[y i y j y k y l ]= R ij R kl + R ik R jl + R il R jk 2. An estimate of the mean m of N independent identically distributed random variables {y,,...,y N } of variance σ 2 can be formed by the weighted sum ˆm = h y + h 2 + +h N y N Determine expressions for the mean and variance of ˆm, that is, the quantities E[ ˆm] and var( ˆm). What are the constraints on the weights h i in order for ˆm to be an unbiased estimate of m? What are the optimal choices for these weights, if in addition, it is required that the variance var( ˆm) be minimum? 3. The sample mean of N independent gaussian random variables {y,,...,y N } of mean m and variance σ 2 is given by 7 8

5 ˆm = ( ) y + + +y N N First, show that ˆm is unbiased and its variance is var( ˆm)= σ 2 /N. Then, show that the probability density of ˆm is p( ˆm)= N /2 (2π) /2 σ exp[ N ( ˆm m)2] 2σ2 Moreover, show that as N, this density converges to the deterministic delta function density p( ˆm) δ( ˆm m). 4. Consider N independent gaussian random variables {y,,...,y N } of mean m and variance σ 2. The sample variance is defined as ˆσ 2 = N (y i ˆm) 2 N where ˆm is the sample mean as defined above. Show that the mean and variance of the sample variance are given by E[ˆσ 2 ]= N N σ2, var(ˆσ 2 )= N N 2σ4 N Note: This is somewhat lower than the CR lower bound 2σ 4 /N. But, this is no contradiction because the CR bound applies to unbiased estimators and the above is slightly biased. 5. Continuing with the previous problem, we can form an unbiased estimator for the variance by the standard deviation: s 2 = N N (y i ˆm) 2 Therefore, s 2 = N ˆσ 2 /(N ). Show that its mean and variance are This does satisfy the CR bound. E[s 2 ]= σ 2, var(s 2 )= 2σ4 N 6. Next, we determine that the probability distribution of s 2 is a χ 2 -distribution with (N ) degrees of freedom. In the definition of s 2, there are N squared terms (y i ˆm) 2, yet we divided by (N ) not N. But, these terms are not mutually independent because of the presence of ˆm. Using these dependencies, one can express s 2 as a sum of (N ) independent square terms, as follows. (a) Consider the following linear transformation (know as Helmert s transformation) from the set {y,...,y N } to a new set {z,...,z N }: z i = c i ( y + + +y i iy i+ ), i =, 2,...,N z N = c N ( y + + +y N ) Determine the scale factors c i in order for the z i s to have unit variance. (b) Then, show that the z i have zero mean and are mutually uncorrelated: E[z i z j ]= δ ij, i,j =, 2,...,N (c) Then, show that the linear transformation preserves the sum of the squares, N z 2 i = N σ 2 therefore, it is an orthogonal transformation. Finally, show that the sum of the first (N ) squared terms is i N χ 2 = z 2 i = N (y σ 2 i ˆm) 2 Thus, the sum of the N squared terms in the right-hand-side follows a normalized χ 2 -distribution with (N ) degrees of freedom. 7. The following twenty random numbers come from an unknown probability distribution: {0.33, 0.52, 2.4,.93, 0.46, 0.44, 0.97, 0.38, 0.48,.29,.82,.23, 0.2, 2.66,.22, 0.4, 0.95,.47, 0.83, 0.43} Test the hypothesis that the underlying distribution is gaussian with zero mean and unit variance. To do this perform the χ 2 test by dividing the range of the gaussian distribution into the following six bins: 9 0

6 (,.5), (.5, 0.5), ( 0.5, 0.0), (0.0, 0.5), (0.5,.5), (.5, ) If the i-th bin is the interval (x i,x i ), then the theoretically expected number of observations that will fall into the i-th bin will be N th i N = F(x i) F(x i ) where N is the total number of observations and F(x) is the cdf of the assumed gaussian distribution, that is, F(x)= 2π x e z2 /2 dz Let N i be the actual number of observations that fall into the i-th bin. Then, calculate the χ 2 statistic given by χ 2 = B (N i N th i ) 2 where B is the number of bins here, B = 6. This quantity follows a χ 2 - distribution with B degrees of freedom. Thus, its mean will be equal to the number of degrees of freedom, namely, B. If your calculated χ 2 is near the theoretical mean B, then you cannot reject the hypothesis that the pdf was gaussian. Alternatively, you can look up the 95 percent confidence interval of the χ 2 distribution with B degrees of freedom, that is, the interval 0 χ 2 χ such that the probability of a χ 2 value falling in it is 0.95 or equivalently, the probability of a χ 2 value falling outside it is only Then, if your calculated value of χ 2 falls within that interval you can with 95 percent confidence conclude that the gaussian assumption cannot be rejected. Note: For B = 5 degrees of freedom, we have χ = Let F(x) be the cdf of a pdf f(x). Show that the random variable u defined by N th i u = F(x) is distributed uniformly over the interval [0, ). Therefore, random variables x following the pdf f(x) can be generated from a uniform random number generator using the inverse function x = F (u). This is the inversion method for generating random numbers from uniform ones (see Appendix A). 9. The Rayleigh probability density finds application in fading communication channels: p(r)= r σ 2 e r2 /2σ 2, for r 0 Using the inversion method, show how to generate a Rayleigh-distributed random variable r from a uniform variable u. 0. The inversion method may also be applied to the problem of generating discrete-valued random variables. Let x be a random variable that can only take one of the discrete values {x,x 2,...,x M } with probabilities {p,p 2,...,p M }, respectively. It is assumed, of course, that the p i sum up to unity. You have available a uniform generator in the interval [0, ). Explain how to generate the discrete random numbers x from a uniform u.. You want to simulate a binary experiment in which only two outcomes can occur, one with probability p and the other with probability p. For example, simulating successive throws of heads or tails, or the transmission of bits 0 or, or, an accept/reject decision, etc. This is the same as the previous problem, with M = 2. The procedure for picking one or the other outcome can be mechanized as follows:. Generate a uniform u. 2. If 0 u<p, then pick the first outcome. 3. If p u<, then pick the second outcome. Explain why this procedure generates the two outcomes with the correct probabilities p and p. Note: The optimization method of simulated annealing uses such twovalued random variables. It is an iterative method of minimizing a performance index J(λ), where λ is a vector of parameters with respect to which J must be minimized. Consider two successive choices of the parameter vector, λ new and λ old, and compute the change in the performance index: ΔJ = J(λ new ) J(λ old ). Most iterative minimization algorithms, such as steeliest descent or Newton s method, try to continuously keep decreasing J, that is, they demand that the change in λ always be such that ΔJ 0. This can easily drive the λ into a local minimum of J and then the algorithm gets stuck there. To alleviate this problem, the so-called Metropolis algorithm of simulated annealing allows on occasion J to increase, that is, ΔJ > 0, in order to 2

7 jump over such local minima and continue decreasing towards the absolute minimum. The algorithm is as follows: If ΔJ 0 then accept the change in the parameter vector λ old λ new. But if ΔJ > 0 then accept the change only with probability p = e βδj and reject the change with probability p, where β is a suitable positive constant. Using the results of this problem, it should be clear how one will make the decision of whether to accept or reject the change. 2. Consider the Box-Muller transformation x = ( 2lnu) /2 cos(2πv), y = ( 2lnu) /2 sin(2πv) Show that if {u, v} are independent uniform random variables in the interval [0, ), then {x, y} are two independent gaussian random variables with zero mean and unit variance. 3. Consider the generalized Box-Muller transformation x = ( 2lnu) /2 cos(2πv), y = ( 2lnu) /2 sin(2πv φ) where φ is a constant angle. Show that if {u, v} are independent uniform random variables in the interval [0, ), then {x, y} are two jointly gaussian random variables with zero mean, unit variance, and correlation coefficient E[xy]= cos φ. 4. Let X and X 2 be two independent random variables with cdf s F (x) and F 2 (x). Show that the random variable X = max(x,x 2 ) has cdf F(x)= F (x)f 2 (x). Show also that X = min(x,x 2 ) has cdf F(x)= F (x)+f 2 (x) F (x)f 2 (x). 5. The inversion method of generating random variables is convenient only when the cdf F(x) is known in closed form or is easily computed. An alternative method that works well when the pdf f(x) is known but the cdf F(x) is complicated, like the gaussian case, is the rejection method. It requires two conditions that are not difficult to meet: First, there exists a so-called majorizing pdf g(x) such that f(x) is bounded from above by f(x) cg(x), for all x where c is a given constant. Second, it is much easier to generate random variables from the distribution g(x) than from f(x). The following algorithm generates an x distributed according to f(x):. Generate an x from the distribution g(x). 2. Generate a y which is uniformly distributed over [0, cg(x)]. 3. If y f(x), then output x; else, go to step and repeat. To show that this procedure correctly generates x s that are distributed according to f(x), we must show that the conditional density of an x generated as above and given that y f(x), is equal to the desired density f(x), that is, p ( X = x Y f(x) ) = f(x) (a) Show first that necessarily c and that p ( Y f(x) X = x ) = f(x) cg(x) which follows from the fact that y is uniform. (b) Then, integrate the above over all x s generated from g(x) to get p ( Y f(x) ) = c (c) Finally, use Bayes rule to determine the quantity p ( X = x Y f(x) ) = p ( Y f(x) X = x ) p(x = x) p ( Y f(x) ) 6. Let y be an M-dimensional gaussian random vector with zero mean and covariance matrix R. Show that the information content or entropy of y is given by S = p(y)ln p(y)d M y = ln(det R) 2 up to an unimportant additive constant. 7. Let y = Bɛ be the innovations representation of an M-dimensional gaussian zero-mean vector. Show that its entropy can be written, up to an additive constant, as follows S = p(y)ln p(y)d M y = 2 M ln E i where E i = E[ɛ 2 i ] are the variances of the innovations. 3 4

8 8. (a) For any two positive real numbers a and b, show the inequality [ ] a a ln b a b (b) Let y be an M-dimensional random vector. For any two probability densities p(y) and q(y), prove the following information inequality, [ ] p(y) p(y)ln d M y 0 q(y) with equality attained when p(y)= q(y). 9. Consider the subset of all M-dimensional probability densities p(y) that have a given mean m and covariance Σ. Show that the density from this subset that has maximum entropy, S = p(y)ln p(y)d M y = max is the gaussian. Hint: Use Lagrange multipliers to enforce the given constraints. Alternatively, use the information inequality of the previous problem. 20. Let Re i = λ i e i, i =, 2,...,M be the M eigenvalues and orthonormal eigenvectors of the covariance matrix of an M-dimensional random vector y. Define the M transformed random variables: Thus, the M-vector y is represented by only L < M parameters, namely, z,z 2,...,z L. This approximation forms the basis of data compression using the Karhunen-Loeve transform. (c) Show the equality of quadratic forms M y T R z 2 i y = λ i (d) Determine the pdf p z (z) of the vector z = [z,z 2,...,z M ] T in terms of the pdf p y (y) (do not assume gaussian distributions). Show that the information content of y is the same as that of z, in the sense that they have equal entropies. (e) If we denote by B the modal matrix of R, that is, the matrix whose columns are the eigenvectors e i, then show that y is related to the z-basis as y = Bz, where B = [e, e 2,...,e M ] Show also that B satisfies BB T = B T B = I, and that R = BDB T, D = diag(λ,λ 2,...,λ M ) z i = e T i y, i =, 2,...,M (a) Show that they are mutually uncorrelated with variances λ i, that is, E[z i z j ]= λ i δ ij (b) Show that y can be expanded in terms of the z i as follows: y = M z i e i Thus, the randomness of y arises only from the randomness of the z i s which are uncorrelated. If the eigenvalues are arranged in decreasing order and the first L largest eigenvalues are dominant, then the sum may be approximated by y L z i e i 5 6

9 332:525 Solutions with optimum solution:. Differentiating E n with respect to ˆm and setting the gradient to zero gives: which has solution: E n ˆm = 2 ˆm n = (x k ˆm)= 0 n x k n = n x k In part (b), the required recursions were shown in class. For part (c), we take expectations of both sides of the definition of ˆm n to get: Next, we have: E[ ˆm n ]= ˆm n E[ ˆm n ]= ˆm n m = The variance of ˆm n will be then E[x k ]= E[( ˆm n m) 2 ]= () 2 x k j=0 m = m m = (x k m) E[(x k m)(x j m)] And, using the iid assumption, we have E[(x k m)(x j m)]= σ 2 x δ kj which gives for the variance of ˆm n : E[( ˆm n m) 2 ]= () 2 σxδ 2 kj = () 2 σ2 x j=0 2. The gradient of the performance index is now: E n ˆm = 2 λ n k (x k ˆm)= 0 = σ2 x ˆm n = n λ n k x k n λn k = x n + λx n + λ 2 x n 2 + +λ n x 0 + λ + λ 2 + +λ n Using the finite geometric series, we may write the denominator as which gives for the estimator ˆm n : λ n k = + λ + λ 2 + +λ n = λn+ λ ˆm n = ( λ) n λ n k x k λ n+ Replacing n by n and multiplying by a factor of λ, gives: λ ˆm n = ( λ)λ n λ n k n x k ( λ) λ n k x k λ n = λ n Thus, we can express the sum up to k = n in terms of ˆm n : n ( λ) λ n k x k = λ( λ n ) ˆm n Therefore, we obtain the recursion for ˆm n ˆm n = ( λ)( n λ n k x k + x n ) λ n+ = λ λn+ λ n+ ˆm n + λ λ n+ x n which can be written in the predictor/corrector form: ( ) λ ˆm n = ˆm n + (x λ n+ n ˆm n ) In the limit λ, the Kalman gain coefficient tends to the expected limit: ( ) λ lim = λ λ n+ On the other hand, if λ is strictly less than one, then the term λ n+ can be ignored after a few iterations, and therefore, the recursion becomes essentially the first-order smoother: ˆm n = ˆm n + ( λ)(x n ˆm n )= λ ˆm n + ( λ)x n 7 8

10 3. The difference equation ˆm n = λ ˆm n + ( λ)x n can be solved assuming zero initial conditions, by convolving the x n sequence with the filter sequence ( λ)λ n. This gives: ˆm n = ( λ) λ n k x k Taking expectations of both sides and using the finite geometric series, we obtain: E[ ˆm n ]= ( λ) λ n k m = λn+ λ m which tends to m for large n. Thus, ˆm n is asymptotically unbiased. Subtracting the mean E[ ˆm n ] from ˆm n gives also: ˆm n E[ ˆm n ]= ( λ) λ n k (x k m) Using the same sort of calculation as in Problem, we obtain for the variance of ˆm n : E [( ˆm n E[ ˆm n ] ) 2 ] = ( λ) 2 = ( λ) 2 n j=0 n j=0 λ n k λ n j E[(x k m)(x j m)] λ n k λ n j σxδ 2 kj = σx( 2 λ) 2 n λ 2(n k) = σ 2 x( λ) 2 λ2(n+) λ 2 = λ + λ σ2 x( λ 2(n+) ) which in the limit of large n converges to the required result. 4. The theoretical gradient is: E ˆm = 2E[(x n ˆm)]= 2(m ˆm) Thus, it vanishes when ˆm = m. The instantaneous gradient is obtained by dropping the expectation value, that is, 9 E ˆm = 2(x n ˆm) Putting this into the LMS updating equation gives: ˆm n+ = ˆm n + Δ ˆm n = ˆm n μ E ˆm n = ˆm n + 2μ(x n ˆm n ) Setting 2μ = λ, we rewrite the difference equation as ˆm n+ = ˆm n + 2μ(x n ˆm n )= ( 2μ) ˆm n + 2μx n = λ ˆm n + ( λ)x n 5. Problem.9: Using x = s + n = s + Fn 2 and y = n 2, we find R yy = E[ ]= E[n 2 2] and R xy = E[xy]= E[(x+Fn 2 )n 2 ]= FE[n 2 2]. The optimal canceler will be H = R xy R yy = FE[n 2 2]E[n 2 2] = F. The corresponding optimum estimate will be ˆx = Hy = Fn 2, and the estimation error e = x ˆx = (s + Fn 2 ) Fn 2 = s. Problem.0: First determine H. Noting that y = n 2 + ɛs = F n + ɛs and using the definition of the gain G, wefindr yy and R xy : Therefore, R yy = E[yy]= ( ) F 2 E[n2 ]+ɛ 2 E[s 2 ]= F + 2 ɛ2 G E[n 2 ] R xy = E[xy]= ( ) F E[n2 ]+ɛe[s 2 ]= F + ɛg E[n 2 ] H = R xy R yy = The error output will be F + ɛg F( + ɛfg) = F + + ɛ 2 ɛ2 G 2 F 2 G e = x ˆx = x Hy = s + n H ( F n + ɛs ) = ( ɛh)s + ( H ) n F 20

11 Thus, the coefficients a and b will be a = ɛh = b = H F = ɛf( + ɛfg) + ɛ 2 F 2 G = ɛf + ɛ 2 F 2 G + ɛfg ɛf = ɛfg + ɛ 2 F 2 G + ɛ 2 F 2 G = ɛfga If the coefficient ɛ is known in advance, then the pre-processed signals will be x = x = s + n = s + Fn 2 y = y ɛx = n 2 + ɛs ɛs ɛfn 2 = ( ɛf)n 2 Thus, y is correlated only with the noise part of x. We find E[x y ] = F( ɛf)e[n 2 2] E[y y ] = ( ɛf) 2 E[n 2 2] and, therefore, 6. For part (a), we have ˆx = H y = H = E[x y ]E[y y ] = F ɛf F ɛf ( ɛf)n 2 = Fn 2 e = x ˆx = s + Fn 2 Fn 2 = s E[yy T ]= BE[zz T ]B T E[yy T ] = B T E[zz T ] B And, similarly, E[xy]= BE[xz] The optimal Wiener weights with respect to the two bases are: h = E[yy T ] E[xy], Therefore, they are related by g = E[zz T ] E[xz] h = E[yy T ] E[xy]= B T E[zz T ] B BE[xz]= B T g or, h T = g T B. It follows that the optimal estimate ˆx will be invariant under a change of basis: Parts (b) and (c) were done in class. ˆx = h T y = g T B Bz = g T z 7. The estimation error is e = x ˆx = x h T y b. The minimization conditions for the performance index E = E[e 2 ] are E h = 2E[ e e ] = 2E[ey]= 0 h E b = 2E[ e e ] = 2E[e]= 0 b which are equivalent to E[ey] = E[(x y T h)y]= E[xy] E[yy T ]h = r Rh = 0 E[e] = E[x h T y b]= E[x] h T E[y] b = m b = 0 8. Part (a) follows from part (b) with the choice Q = I. For part (b), we have R zy = E[zy T ]= QE[yy T ]= QR yy H = R zy R yy = Q It follows that ẑ = Hy = Qy = z. Part (c) can be shown as follows: Note that the subvector y can be obtained from the full vector y by the projection matrix [ ][ ] I 0 y y = = Qy 0 0 where I is the identity matrix with the same dimension as y. Using part (b) with z = y, we find ŷ = y. This result can also be shown directly, as follows. Using the notation R ij = E[y i y T j ], for i, j =, 2, we have E[y y T ] = E [ y [y T, y T 2 ] ] = [ E[y y T ], E[y y T 2 ] ] = [R,R 2 ] E[yy T ] = E [[ ] y [y T y, y T 2 ] ] [ ] R R 2 = 2 R 2 R 22 But noting that we obtain [R,R 2 ]= [I, 0] H = E[y y T ]E[yy T ] = [R,R 2 ] Thus, ŷ = Hy = [I, 0] [ y ] = y. [ R R 2 R 2 R 22 ] [ R R 2 R 2 R 22 ] = [I, 0] 2 22

12 9. Using part (a) of the previous problem, we have ŷ = y. Therefore, ˆx = ĉt y = c T ŷ = c T y and e = x ˆx = c T y + v c T y = v. If the estimation is based only on the subvector y, then we have ŷ = y, and therefore, and for the error output ˆx = c T ŷ + c T 2 ŷ2 = c T y + c T 2 ŷ2/ e = x ˆx = c T y + c T 2 + v c T y c T 2 ŷ2/ = Setting e 2 = ŷ 2, we have e = v + c T 2 e 2. And, E = E[e 2 ]= σ 2 v + ct 2 E[e 2 e T 2 ]c 2 But, E[e 2 e T 2 ]= R 22 R 2 R R 2, which also shows the non-negativity property. 0. Going through the Gram-Schmidt orthogonalization procedure, we find the matrices B and D = R ɛɛ : B = , D = We also need the inverses 0 0 B = 2 0, R = B T D B = 2 2 Thus, the innovations basis is ɛ ɛ 2 ɛ 2 = ɛ = B y = and conversely, y = y = Bɛ = ɛ ɛ 2 ɛ 3 y y 3 = = y 2y y y ɛ ɛ 2 + 2ɛ ɛ 3 + 2ɛ 2 + 2ɛ For the estimation part, we calculate the h and g weights using the formulas g = D B r = 2 4, h = B T g = R r = 2 6. The three g weights are the optimal weights for the lower order estimation problems, that is, ˆx = g ɛ = 2ɛ ˆx 2 = g ɛ + g 2 ɛ 2 = 2ɛ 4ɛ 2 ˆx 3 = g ɛ + g 2 ɛ 2 + g 3 ɛ 3 = 2ɛ 4ɛ 2 + ɛ 3 Replacing the ɛ i in terms of the y i,weget ˆx = 2y ˆx 2 = 2y 4( 2y )= 0y 4 ˆx 3 = 0y 4 + (y y )= 2y 6 + y 3 For the mean square errors, using the variances of the ɛ i, {E,E 2,R 3 }= {2,, 2}, and starting with E 0 = σ 2 x = 30, we get E = E 0 g 2 E = = 22 E 2 = E g 2 2E 2 = 22 ( 4) 2 = 6 E 3 = E 2 g 2 3E 3 = = 4 For the prediction, we want to show that the a ij coefficients are the matrix elements of B. This can be seen in general by writing the expression of the ɛ i in terms of the y i, as follows: ɛ = y ɛ 2 = ŷ 2/ = + a 2 y ɛ 3 = y 3 ŷ 3/2 = y 3 + a 32 + a 3 y which is equivalent to ɛ = B y. ɛ ɛ 2 ɛ 3 = 0 0 a 2 0 a 3 a 32 y y

13 2. The difference equation is y n 2 cos ω y n + y n 2 = 0 Indeed, using y n = A cos(ω n + φ), we have 2 cos ω y n = 2A cos ω cos ( ω (n )+φ ) = A cos(ω n + φ)+a cos(ω n 2ω + φ)= y n + y n 2 where we used the trig identity 2 cos a cos b = cos(a + b)+ cos(a b) Using this trig identity again, we obtain for the autocorrelation function: R(k) = E[y n+k y n ]= A 2 E[cos(ω n + ω k + φ)cos(ω n + φ)] = 2 A2 E[cos(2ω n + ω k + φ)+ cos(ω k)] = 2 A2 cos(ω k) can be expressed as a linear combination of the other two, for example, the first column is expressible as cos ω cos 2ω = 2 cos ω cos ω cos ω The Gram-Schmidt construction proceeds as follows: where ɛ 0 = y 0 ɛ = y b 0 ɛ 0 ɛ 2 = b 20 ɛ 0 b 2 ɛ b 0 = E[y ɛ 0 ] E[y 0 y 0 ] = R() R(0) = cos ω cos 2ω cos ω The quantity E = E[ɛ 2 ] is calculated by squaring the expression y = ɛ +b 0 ɛ 0, taking expectations of both sides, and using E 0 = E[ɛ 2 0]= R(0): where the first expectation value is zero, as follows from the property 2π E[cos(φ + θ)]= cos(φ + θ) dφ 0 2π = 0 for φ uniform over [0, 2π) and θ deterministic. The 3 3 autocorrelation matrix will be R ij = E[y i y j ]= R(i j). Noting that R(i j)= R(j i), we find R = R(0) R() R(2) R() R(0) R() R(2) R() R(0) Its determinant is = 2 A2 cos ω cos 2ω cos ω cos ω cos 2ω cos ω det R = 8 A6[ + 2 cos 2 ω cos 2ω cos 2 2ω 2 cos 2 ω ] Using the trig identity cos 2ω = 2 cos 2 ω, we can verify that the expression in the brackets vanishes. The same result also follows from the observation that the rank of R is two not three because each column R(0)= E[ ]= E + b 2 0E 0 E = R 0 b 2 0E 0 = R(0)( b 2 0) or, Similarly, we find b 20 = E[ɛ 0 ] E 0 E = R(0)( cos 2 ω )= R(0)sin 2 ω = R(2) R(0) = cos 2ω b 2 = E[ɛ ] = E[(y b 0 y 0 )] = R() b 0R(2) E E E = cos ω cos ω cos 2ω sin 2 ω = cos ω ( cos 2ω ) sin 2 ω = cos ω (2 sin 2 ω ) sin 2 ω = 2 cos ω Thus, the B matrix will be 25 26

14 B = 0 0 b 0 0 b 20 b 2 = 0 0 cos ω 0 cos 2ω 2 cos ω The prediction error E 2 is expected to be zero because the can be predicted exactly from {y 0,y }, as follows from the difference equation applied with n = 2: 2 cos ω y + y 0 = 0 Indeed, squaring the equation = ɛ 2 + b 20 ɛ 0 + b 2 ɛ and taking expectations of both sides, we get 3. Part (a) follows from part (b) and stationarity. Indeed, E[yn+k y n ] 2 E[ n+k ]E[y2 n ] R(k) 2 R(0)R(0) or, R(k) R(0). [ ] Part (b) can be derived as follows: The autocorrelation u matrix of y = is v R = E[yy T ]= E [[ u v ] [u, v] ] = [ E[u 2 ] E[uv] E[vu] E[v 2 ] Because this matrix is positive semi-definite, its determinant will be nonnegative, that is, ] R(0)= E[ 2]= E 2 + b 2 20E 0 + b 2 2E det R = E[u 2 ]E[v 2 ] E[uv] 2 0 and solving for E 2 E 2 = R(0) b 2 20E 0 b 2 2E = R(0) cos 2 2ω E 0 4 cos 2 ω E = R(0) cos 2 2ω R(0) 4 cos 2 ω sin 2 ω R(0) = ( cos 2 2ω )R(0) 4 cos 2 ω sin 2 ω R(0) = sin 2 2ω R(0) sin 2 2ω R(0)= 0 Thus, the D matrix will be E D = 0 E 0 = R(0) 0 0 E sin 2 ω Finally, one should be able to verify the Cholesky factorization R = BDB T, which in this case reads as follows (we removed an overall factor of R(0)): cos ω cos 2ω cos ω cos ω = cos 2ω cos ω = cos ω 0 0 sin 2 ω 0 cos 2ω 2 cos ω cos ω cos 2ω 0 2cosω

ESTIMATION THEORY. Chapter Estimation of Random Variables

ESTIMATION THEORY. Chapter Estimation of Random Variables Chapter ESTIMATION THEORY. Estimation of Random Variables Suppose X,Y,Y 2,...,Y n are random variables defined on the same probability space (Ω, S,P). We consider Y,...,Y n to be the observed random variables

More information

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes Shannon meets Wiener II: On MMSE estimation in successive decoding schemes G. David Forney, Jr. MIT Cambridge, MA 0239 USA forneyd@comcast.net Abstract We continue to discuss why MMSE estimation arises

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

COMP 558 lecture 18 Nov. 15, 2010

COMP 558 lecture 18 Nov. 15, 2010 Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to

More information

5 Operations on Multiple Random Variables

5 Operations on Multiple Random Variables EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix: Joint Distributions Joint Distributions A bivariate normal distribution generalizes the concept of normal distribution to bivariate random variables It requires a matrix formulation of quadratic forms,

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Solutions to Homework Set #6 (Prepared by Lele Wang)

Solutions to Homework Set #6 (Prepared by Lele Wang) Solutions to Homework Set #6 (Prepared by Lele Wang) Gaussian random vector Given a Gaussian random vector X N (µ, Σ), where µ ( 5 ) T and 0 Σ 4 0 0 0 9 (a) Find the pdfs of i X, ii X + X 3, iii X + X

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π Solutions to Homework Set #5 (Prepared by Lele Wang). Neural net. Let Y X + Z, where the signal X U[,] and noise Z N(,) are independent. (a) Find the function g(y) that minimizes MSE E [ (sgn(x) g(y))

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Topics in Probability and Statistics

Topics in Probability and Statistics Topics in Probability and tatistics A Fundamental Construction uppose {, P } is a sample space (with probability P), and suppose X : R is a random variable. The distribution of X is the probability P X

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

4 Derivations of the Discrete-Time Kalman Filter

4 Derivations of the Discrete-Time Kalman Filter Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Machine Learning. A Bayesian and Optimization Perspective. Academic Press, Sergios Theodoridis 1. of Athens, Athens, Greece.

Machine Learning. A Bayesian and Optimization Perspective. Academic Press, Sergios Theodoridis 1. of Athens, Athens, Greece. Machine Learning A Bayesian and Optimization Perspective Academic Press, 2015 Sergios Theodoridis 1 1 Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens,

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fifth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada International Edition contributions by Telagarapu Prabhakar Department

More information

Problem Set 7 Due March, 22

Problem Set 7 Due March, 22 EE16: Probability and Random Processes SP 07 Problem Set 7 Due March, Lecturer: Jean C. Walrand GSI: Daniel Preda, Assane Gueye Problem 7.1. Let u and v be independent, standard normal random variables

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)

More information

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y. CS450 Final Review Problems Fall 08 Solutions or worked answers provided Problems -6 are based on the midterm review Identical problems are marked recap] Please consult previous recitations and textbook

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

Statistical and Adaptive Signal Processing

Statistical and Adaptive Signal Processing r Statistical and Adaptive Signal Processing Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing Dimitris G. Manolakis Massachusetts Institute of Technology Lincoln Laboratory

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 11 Adaptive Filtering 14/03/04 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Lecture 4: Least Squares (LS) Estimation

Lecture 4: Least Squares (LS) Estimation ME 233, UC Berkeley, Spring 2014 Xu Chen Lecture 4: Least Squares (LS) Estimation Background and general solution Solution in the Gaussian case Properties Example Big picture general least squares estimation:

More information

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel Lecture Notes 4 Vector Detection and Estimation Vector Detection Reconstruction Problem Detection for Vector AGN Channel Vector Linear Estimation Linear Innovation Sequence Kalman Filter EE 278B: Random

More information

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

ECE 275A Homework 6 Solutions

ECE 275A Homework 6 Solutions ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =

More information

ECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018

ECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018 ECE534, Spring 2018: s for Problem Set #4 Due Friday April 6, 2018 1. MMSE Estimation, Data Processing and Innovations The random variables X, Y, Z on a common probability space (Ω, F, P ) are said to

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

ENGR352 Problem Set 02

ENGR352 Problem Set 02 engr352/engr352p02 September 13, 2018) ENGR352 Problem Set 02 Transfer function of an estimator 1. Using Eq. (1.1.4-27) from the text, find the correct value of r ss (the result given in the text is incorrect).

More information

ECE534, Spring 2018: Solutions for Problem Set #5

ECE534, Spring 2018: Solutions for Problem Set #5 ECE534, Spring 08: s for Problem Set #5 Mean Value and Autocorrelation Functions Consider a random process X(t) such that (i) X(t) ± (ii) The number of zero crossings, N(t), in the interval (0, t) is described

More information

Mathematical Methods wk 2: Linear Operators

Mathematical Methods wk 2: Linear Operators John Magorrian, magog@thphysoxacuk These are work-in-progress notes for the second-year course on mathematical methods The most up-to-date version is available from http://www-thphysphysicsoxacuk/people/johnmagorrian/mm

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28

More information

which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i

which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i MODULE 6 Topics: Gram-Schmidt orthogonalization process We begin by observing that if the vectors {x j } N are mutually orthogonal in an inner product space V then they are necessarily linearly independent.

More information

Least Squares and Kalman Filtering Questions: me,

Least Squares and Kalman Filtering Questions:  me, Least Squares and Kalman Filtering Questions: Email me, namrata@ece.gatech.edu Least Squares and Kalman Filtering 1 Recall: Weighted Least Squares y = Hx + e Minimize Solution: J(x) = (y Hx) T W (y Hx)

More information

Least Squares Estimation Namrata Vaswani,

Least Squares Estimation Namrata Vaswani, Least Squares Estimation Namrata Vaswani, namrata@iastate.edu Least Squares Estimation 1 Recall: Geometric Intuition for Least Squares Minimize J(x) = y Hx 2 Solution satisfies: H T H ˆx = H T y, i.e.

More information

26. Filtering. ECE 830, Spring 2014

26. Filtering. ECE 830, Spring 2014 26. Filtering ECE 830, Spring 2014 1 / 26 Wiener Filtering Wiener filtering is the application of LMMSE estimation to recovery of a signal in additive noise under wide sense sationarity assumptions. Problem

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Adaptive Filter Theory

Adaptive Filter Theory 0 Adaptive Filter heory Sung Ho Cho Hanyang University Seoul, Korea (Office) +8--0-0390 (Mobile) +8-10-541-5178 dragon@hanyang.ac.kr able of Contents 1 Wiener Filters Gradient Search by Steepest Descent

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Principal Component Analysis and Linear Discriminant Analysis

Principal Component Analysis and Linear Discriminant Analysis Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Matrix Factorization and Analysis

Matrix Factorization and Analysis Chapter 7 Matrix Factorization and Analysis Matrix factorizations are an important part of the practice and analysis of signal processing. They are at the heart of many signal-processing algorithms. Their

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei) UCSD ECE53 Handout #34 Prof Young-Han Kim Tuesday, May 7, 04 Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei) Linear estimator Consider a channel with the observation Y XZ, where the

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno Stochastic Processes M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno 1 Outline Stochastic (random) processes. Autocorrelation. Crosscorrelation. Spectral density function.

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2 APPM/MATH 4/5520 Solutions to Exam I Review Problems. (a) f X (x ) f X,X 2 (x,x 2 )dx 2 x 2e x x 2 dx 2 2e 2x x was below x 2, but when marginalizing out x 2, we ran it over all values from 0 to and so

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Numerical Algorithms as Dynamical Systems

Numerical Algorithms as Dynamical Systems A Study on Numerical Algorithms as Dynamical Systems Moody Chu North Carolina State University What This Study Is About? To recast many numerical algorithms as special dynamical systems, whence to derive

More information

Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver

Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver Stochastic Signals Overview Definitions Second order statistics Stationarity and ergodicity Random signal variability Power spectral density Linear systems with stationary inputs Random signal memory Correlation

More information

Further Mathematical Methods (Linear Algebra)

Further Mathematical Methods (Linear Algebra) Further Mathematical Methods (Linear Algebra) Solutions For The 2 Examination Question (a) For a non-empty subset W of V to be a subspace of V we require that for all vectors x y W and all scalars α R:

More information

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I Code: 15A04304 R15 B.Tech II Year I Semester (R15) Regular Examinations November/December 016 PROBABILITY THEY & STOCHASTIC PROCESSES (Electronics and Communication Engineering) Time: 3 hours Max. Marks:

More information

Problem Set 1 Sept, 14

Problem Set 1 Sept, 14 EE6: Random Processes in Systems Lecturer: Jean C. Walrand Problem Set Sept, 4 Fall 06 GSI: Assane Gueye This problem set essentially reviews notions of conditional expectation, conditional distribution,

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Expectation. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Expectation DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean, variance,

More information

Math Linear Algebra II. 1. Inner Products and Norms

Math Linear Algebra II. 1. Inner Products and Norms Math 342 - Linear Algebra II Notes 1. Inner Products and Norms One knows from a basic introduction to vectors in R n Math 254 at OSU) that the length of a vector x = x 1 x 2... x n ) T R n, denoted x,

More information

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6 MTH739U/P: Topics in Scientific Computing Autumn 16 Week 6 4.5 Generic algorithms for non-uniform variates We have seen that sampling from a uniform distribution in [, 1] is a relatively straightforward

More information

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Expectation. DS GA 1002 Probability and Statistics for Data Science.   Carlos Fernandez-Granda Expectation DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean,

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

. Find E(V ) and var(v ).

. Find E(V ) and var(v ). Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

5.6. PSEUDOINVERSES 101. A H w.

5.6. PSEUDOINVERSES 101. A H w. 5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

Statistical Signal Processing Detection, Estimation, and Time Series Analysis Statistical Signal Processing Detection, Estimation, and Time Series Analysis Louis L. Scharf University of Colorado at Boulder with Cedric Demeure collaborating on Chapters 10 and 11 A TT ADDISON-WESLEY

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition A Brief Mathematical Review Hamid R. Rabiee Jafar Muhammadi, Ali Jalali, Alireza Ghasemi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Probability theory

More information

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION) HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION) PROFESSOR STEVEN MILLER: BROWN UNIVERSITY: SPRING 2007 1. CHAPTER 1: MATRICES AND GAUSSIAN ELIMINATION Page 9, # 3: Describe

More information

Statistical signal processing

Statistical signal processing Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable

More information

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero 1999 32 Statistic used Meaning in plain english Reduction ratio T (X) [X 1,..., X n ] T, entire data sample RR 1 T (X) [X (1),..., X (n) ] T, rank

More information